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1 . Introduction 


Uncertainty may be caused by the ambiguity in the terms 
used to describe a specific situation. It may also be caused 
by skepticism of rules used to describe a course of action or 
by missing and/or erroneous data. [For a small sample of work 
done in the area, the reader is referred to (Arciszewski & 
Ziarko 1986), (Bobrow, et.al. 1986), (Wiederhold, et. al. 
1986), (Yager 1984), and (Zadeh 1983).] 

To deal with uncertainty, techniques other than classical 
logic need to be developed. Although, statistics may be the 
best tool available for handling likelihood, it is not always 
adequate for dealing with knowledge acquisition under 
uncertainty. [We refer the reader to Mamdani, et. al. (1985) 
for a study of the limitations of traditional statistical 

methods . ] 

Inadequacies caused by estimating probabilities in 
statistical processes can be alleviated through use of the 
Dempster-Shafer theory of evidence. [ For a sample of works 
using the Dempster-Shafer theory see (Shafer 1976) , (de 
Korvin, et. al. 1990), (Kleyle & de Korvin 1989), (Strat 
1990), and (Yager).] Fuzzy set theory is another tool used to 
deal with uncertainty where ambiguous terms are present. 
[Articles in (Zadeh 1979, 1981 & 1983) illustrate the numerous 
works carried out in fuzzy sets.] Other methods include rough 
sets, the theory of endorsements and nonmonotonic logic. [The 
work on rough sets is illustrated in (Fibak, et. al. 1986), 
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(Grzymala-Busse 1988), and (Mrozek 1985 & 1987). Also, see 
(Mrozek 1985) and (Pawlak 1982) for the application of rough 
sets to medicine and (Arciszewski & Ziarko 1986) and (Pawlak 
1981) for applications to industry.] 

J. Grzymala-Busse (1988) has defined the concept of 
lower and upper approximation of a (crisp) set and has used 
that concept to extract rules from a set of examples. We will 
define the fuzzy analogs of lower and upper approximations and 
use these to obtain certain and possible rules from a set of 
examples where the data is fuzzy. Central to these concepts 
will be the idea of the degree to which a fuzzy set A is 
contained in another fuzzy set B, and the degree of 
intersection of set A with set B. These concepts will also 
give meaning to the statement; A implies B. The two meanings 
will be: 1) if x is certainly in A then it is certainly in B, 
and 2) if x is possibly in A then it is possibly in B. Next, 
classification will be looked at and it will be shown that if 
a classification is well externally definable then it is well 
internally definable, and if it is poorly externally definable 
then it is poorly internally definable, thus generalizing a 
result of Grzymala-Busse (1988). Finally, some ideas of how to 
define consensus and group opinions to form clusters of rules 
will be given. 

2 . Results 

We now recall some basic definitions such as lower and 
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upper approximations and the concept of an information system. 

Let U be the universe. Let R be an equivalence relation 
on U. Let X be any subset of U. If [x] denotes the equivalence 
class of x relative to R, then we define 
R (X) = (x £ U/[x] c X} and 
R (X) = {x € U/[x] n X * o). 

R(X) is called the lower approximation of X and R (X) is 
called an upper approximation of X. Then B(X) c x c R(X) . If 
R(X) = X = R(X) , then X is called definable. 

An information system is a quadruple (U,Q,V,r) where U is 
the universe and Q is a subset of C u D where C n D — 0. The 
set C is called the set of conditions? D is called the set of 
decisions. We assume here that Q = C. The set V stands for 
value and r is a function from UxQ into V where r(u,q) denotes 
the value of attribute q for element u. The set C induces 
naturally an equivalence on U by partitioning U into sets over 
which all attributes are constant. The set X is called roughly 
C-definable if 
R(X) * 0 and R(X) * U. 

It will be called internally C-undef inable if 
B(X) = 0 and R(X) * U. 

It will be called externally C-undef inable if 
R(X) * 0 and R(X) = U. 

Fuzzv sets defined 

Next, we define two functions on pairs of fuzzy sets that 
wi.11 be of importance in the present work. 
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I(AcB)=inf Max {1 - A(x) , B(x) } (1) 

J(A#B)=Max Min (A(x) , B(x)}. (2) 

Here A and B denote fuzzy subsets of the same universe. The 
function I (A c B) measures the degree to which A is included 
in B and J (A # B) measures the degree to which A intersects B. 
It is important to note that for the crisp case, I(AcB) =1 
iff AcB and is 0 otherwise. Similarly, J(A#B)= 1 iff A nB * 0 . 

The goal is to define the fuzzy terms involved in the 
decision as a function of the terms used in the conditions. 
This is accomplished as a function of how much the decision 
follows the conditions. Let {B,} be a finite family of fuzzy 
sets. Let A be a fuzzy set. By a lower approximation of A 
through { B f } , we mean the fuzzy set 

R (A) = u I ( B { c A ) B, (6) 

The decision making process may be simplified by disregarding 
all sets B, if I ( B, c A ) is less than some threshold a. 
Then, 

R (A) a = u I ( B, c A ) B. (7) 

over all B 1 for which I ( B { c A ) > a. 

Similarly, we can define the upper approximation of A 
through { B,. } as 

R (A) # = u J ( B, # A ) B, (8) 

over all B ( for which J ( Bj # A ) > a. 

The operators I and J will yield two possible sets of 
rules: the certain rules and the possible rules. It is 

straightforward to see that if { B ( . } are crisp equivalency 
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classes we get the lower and upper approximations as defined 
by Grzymala-Busse (1988). 

Determining Fuzzy Rules 

We now show how rules can be obtained from the raw data 
given in Table 1 after converting this data according to the 
professor's evaluation of the performance of the students, 
relative to exams high, exams low, project high, project low, 
and his belief with respect to each student getting an A. (See 
Table 2 for the converted data . ) 


Table 1: Production/Operations Management Grades 


Student 

Exams ( 2 ) 

Project 

(Written & Oral) 

Course Grade 

1 

75 

85 

75.36 

2 

94 

87 

89.53 

3 

88 

89.3 

89.93 

4 

79.5 

95 

78 . 06 

5 

85 

97 

90.85 

6 

56.5 

88.6 

60.89 

7 

65 

91.6 

76.15 

8 

49 

76.7 

59.22 

9 

63 . 5 

89 . 1 

69.99 

10 

57 

76.9 

55.77 

11 

70 

98 

80.3 

12 

93 

88 

90. 1 


It can be observed that none of the course grades was a 
strong predictor of "success". In other words, the course 
grades of 90 or slightly better than 90 as a "quality" measure 
of the final product did not allow the professor strong belief 
in the awarding of an "A" to the student. The professor's 
belief in these grades being the best in the class and 
therefore deserving of an "A" grade was approximately . 67 . The 
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belief in the lower scores is scaled downward from .67 to .41 
(the latter representing belief that 55.77 will be the top 
score in the class.) 

The professor recognized the high exam scores of 94 and 
93, with belief of .99/EH and .98/EH, respectively (EH: Exams 
High). The low exam score of .49 was designated .92/EL (EL: 
Exams Low) by the professor. Since all project grades were 
relatively close and relatively high, the professor saw little 
differentiation between the "top" score and the other scores. 
The "top" project score is .54 high and .46 low. (.54/PH and 
.46/PL, respectively) This contrasts with the worse project 
score being .43/EH and .59/EL, where .59 is the highest belief 
that a project grade is a "low" score. This approach was 
considered to be consistent since although exam grades varied 
from 49 to 94, no project grade was below a 76.7. It was felt 
that keeping the project grades from being too strongly biased 
toward "high" would prevent the decision rules from being 
overly biased toward high project grades. Enough 
differentiation was considered to allow the rough set 
formulation to consider both attributes in the decision rules 
for awarding a "top" score of "A" to a student. Each student's 
scores were translated into belief with respect to EH, EL, PH, 
PL and "A". 

For our example of twelve POM students, x t , x 2 ,...,x 12 , 
we let EH: exams high PH: project high 

EL: exams low PL: project low "A": Top Grade 


60 



Thus, for the first student, x, , the belief that the exams 
were high is .79/EH, and that the exams were low is .60/EL; 
that the project grade was high is .47/PH and that it was low 
is .53/PL. The strength of belief for an A is . 56/"A" . In 
addition, EH may be viewed as a fuzzy set of students, such 
that EH = . 79/x 1 + .99/x 2 +...+ .98/x 12 , where x 2 is an 
excellent example of EH (.99) while x 8 is not such a good 
example (.52). (See Table 2 below for all the professor's 

evaluative scores.) 


Table 2: Professor's Evaluative Scores 



Using our rough set theory formulas as they have been 
developed for fuzzy systems of attributes and decisions, we 

compute ; 

I (EH c "A") = .41 I (EH n PH c "A") = .51 

I (EL c "A") = .41 I (EH n PL c "A") = .42 

I(PH c "A") = .51 I(EL fl PH c "A") = .51 

I (PL c "A") = .42 I (EL O PL c "A") — .42 

^ith a lower approximation for a = .50 defined by. 
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E =.51 PH u .51 (EH n PH). 

The extracted rules would imply that high project scores 
and high exam scores both impact a high course grade with 
certainty .51. 

Possibility rules can be determined by computing: 


J (EH 

#"A") = .67 

J (EH 

n 

PH 

# 

"A") 

= .54 

J (EL 

# A") = .59 

J (EH 

n 

PL 

# 

"A") 

= .53 

J (PH 

#"A") = .54 

J (EL 

n 

PH 

# 

"A") 

= .54 

J (PL 

#"A") = .53 

J (EL 

n 

PL 

# 

"A") 

= .53 


with an upper approximation at a = .60 defined as: 

R = .67 EH. 

Thus, we can see that the factors dictating the "best" 
in the class are: 

1) If project grades are high, an "A" score will be attained. 
(Certainty = .51) 

2) If project grades and exam grades are high, an "A” score 
will be attained. (Certainty = .51) 

3) If exam grades are high, an "A" score will be attained. 
(Possibility = .67) 

Indeed, these rules reflect the fact that exam grades 
are more heavily weighted than the project grade toward 
determining the final course grade. Additionally, these two 
grades comprise the majority of the weighted scores from which 
the course grade is calculated. 

Belief & Possibility 

We can use the functions I and J to determine two 
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meanings of A implies B. The belief that if x is certainly in 
A then it is certainly in B is given by: 

I[ R (A) c R (B) ] (9) 

and the belief that if x is possibly in A then it is possibly 

in B can be defined by: 

J[ R (A) # R (B) ] (10) 

This interpretation follows from the fact that B( A ) are 
objects certainly in A and R (A) are objects possibly in A. We 
now turn to the study of classifications. 

Classifications 

The study of classifications is of great interest 
because in learning from examples, the rules are derived from 
classifications generated by simple decisions. In this 
section, we turn our attention to classifications. Of course, 
the traditional meaning is to partition. In our setting, we 
have ill-defined boundaries, so we need to relax the concept 
of partitions by requiring that the sets not overlap too much. 

As earlier, consider a finite family of fuzzy sets, 

{ B i } . Let 7T denote a finite family of fuzzy sets 

IT = { , A 2 1 • • • / A n ) 

We define 

Pir a = { E(A 1 ) a , B(A n ) a >, 

Pir a = { R(A 1 ) a , •••/ R(A n ) a } 

where the lower and upper a-approximations are generated by 

the finite sequence { B^ } . 

We can develop the following relationship. 
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d’[A = B] = Min { I (A C B) , I( B C A)} 
using the following definitions: 
d" [P7r a = jr] = Min {d°[R(A k ) a = A k ]} 

d*[PJT # = jt] = Min{d" [R(A k ) tf = A k ] } 

jt will be called { B i } definable to the degree B with 
threshold a if 

Min { d°[PTr a = n], d°[P7r a = it] } > B. 

If we define 

d° [PJT a = PJr a ] = Min { d'[R(A k ) a = R(A k ) a ]}, 
it can be shown that if B > h r then 

d° [P 7 r a = 7 r ] > B and d° [P»r a = 7r] > B imply that 
d'[PJr a = - P* a ] > B. 

Recall that the following result is shown in information 
systems. For classifications, if PA k is the universal set for 
each k, then PA k is empty for each k. Also, if PA k is nonempty 
for each k, the~PA k is not the universal set for any value of 
k. We would like to get the analog of this by showing if R(A k ) a 
"has some substance" for some k, then R(Aj) a for j # k is "not 
too large", and if R(A k ) a is "fairly substantial", R(Aj) a for 
j * k cannot be "too large". In this sense, the results of 
Grzymala-Busse (1988) will be generalized. 

We would like { A k ) and {B ( ) to somewhat approximate a 
partition. We define the following two conditions: 

(*) For every 0 < c < 1, there exists 0 < 6 < 1 such that if 
B^Xq) > e, then B t (x 0 ) <1-5 for i * i. 

(**)For every pair j,k with j * k and all x, A k (x) +Aj(x) < 1. 
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Conditions (*) and (**) both express that the overlap is not 
too large and obviously hold for partitions. We note that if 
(**) holds for { B- } then it implies (*) • Indeed, in this case 
we pick S = €. Thus, the results that follow may be shown 

assuming condition (**) for { B. } and {AJ. 

We first show that under conditions (*) and (**) , 
whenever R(AJ a is bounded away from 0, then R(A j ) a for j * k 
is bounded away from 1. Suppose E(A k ) a (x 0 ) > e, then for some 
i, I ( Bj c A k ) > e and Bj(x 0 ) > £ , so for l * i from condition 
(*), we have BJxJ <1-5- For any t * i we have 
J(B t # A j )B t (x 0 ) <1-5. Now 

J ( B- # AJ - 1 “ I(B, c -AJ ; 

I(B- c AJ = Min Max {l-Bj(x), A k (x)}; 

1 K x 

I (B- c -A ) = Min Max (1-B,(x), l-A-(x)}. 

Condition (**) implies I(Bj c A k ) < I(Bj c —Aj ) for all j * k. 

From the above it follows that J(B ( # Aj) < 1 - £• Thus, 

R(Aj).(x 0 ) < Max { 1- £, 1- 5) • 

We now show a rough converse to the above. If R(AJ is 

bounded away from 0, then for j * k, E(A j ) a is bounded away 

from 1. Suppose R(A k ) a (x 0 ) > 1 - £ for some k, then 

J ( Bj # A k ) Bj (x 0 ) > 1 - £ for some i„. 

1 o 'o 

Pick j * k. Then 

I ( B i 0 c A j } = 1 " J(B ' 0 # "’V 1 

Now, J ( Bj # -’Aj) = Max Min {B- (x), 1 - AJx)}; 

' 1 o J x 0 

J(B- # AJ = Max Min {Bj (x) , A k (x)). 

1 o X 0 

By (**) it follows that J(B jQ # -AJ > J(Bj # A k ) . 
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From above, I(B. c A : ) < 1 - J(B 5 # AJ < e. 

1 o J 1 o * 

Since Bj JXg) >1 - e, by (*) , B^Xq) <0 for i * i Q where O<0 <1. 
Therefore, R(Aj ) a (x Q ) < Max { e,0). 

Consensus 

We can define consensus between two rows of a table by 

Consensus [Row,., RoWj] = Min { I [Row,, c RoWj], IfRoWj c Row ; ] } 

Here, Row ; and RoWj are considered to be fuzzy subsets of the 

set of all attributes and decisions. If y is some 

predetermined threshold, we pick some x, and then all x . for 

which Consensus [Row,, RoWj] > y. If any of the x's are left 

over, we start again with the first x available. We thus get 

fuzzy sets S,, S 2 , ..., S t where (€,) = 1 for some ( which 

we might call the leader of SJ and (x) = Consensus (1., x) 

provided (x) exceeds y. Within each S, we then can recompute 

the symptoms/decisions for x, taking /x. (x : ) into account 

j ^ j 

If 1 < i < t , then we have i (aggregated) decisions and using 
fuzzy cardinality we can compute the ’’firing strength" of each 
block of rules. This approach has the advantage of taking 
consensus of opinions into consideration in the decision. The 
detailed methodology will be discussed in a later paper. 
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